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ABSTRACT 


NOVA CPU IMPLEMENTATION WITH 2901 BIT SLICE 
by 
Larry Wayne Abbott 


Master of Science in Engineering 


There are several methods which can be used in the 
design of a digital computer. iach of these approaches 
has its advantages and its disadvantages. To learn the 
trade-offs that apply to the bit slice and microprogram 
methods, a partial build up of a NOVA CPU was done. In 
the obuild up, special attention was focused on the 
sequencing and control of the CPU. The Project Report 
vresents the outcome of the hardware build up and, in 
particular, it addresses the issues involved in microcode 
sequencing and decoding. Two methods of sequencing and 
decoding are presented in detail. One method relies on 
firmware to do all the sequencing and mode decoding, such 
as address modes. The other method relies on firmware 
and the Mapping PROM to do the sequencing and mode 


decoding. This project Report. investigates the 


implications of both methods on speed and memory 
requirements for the CPU. Finally, this Project Report 
presents technology trends, and investigates the potential 


use of bit slice technology in future systems. 
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firmware, and organizational reauirements for. a high 
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performance minicomputer design. To fulfill this goal a 
computer central processing unit was constructed within 
the constraints of the time and money available to me. 

Several design criteria had to be considered, 

balancing time and money available against firmware and 
hardware goals. The results of this trade-off are as 
follows: 

1. The bit slice approach was chosen. This approach 
gives ease of interfacing the various elements of 
the computer. Bit slice fabrication technology 
is also capable of providing the speed necessary 
for a high performance minicomputer. 

2. The instruction set chosen was an emulation of 
the Data General NOVA 1200 set. It is relatively 
easy to implement, offers adequate power, and has 
a large expanding software base. In addition, 
at least two software compatible microprocessors 
exist, the Fairchild 9440 and the Data General 
MN601. . 

3 Only representative instructions would be micro- 
vrogrammed because of the large amount of time 


required to microprogram an instruction. 
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i.e Certain sections of the CPU would not be 
completely built, and other sections would not be 
built at all vecause all the information wanted 


could be learned. without a full implementatio: 


' 
e 
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For example, only one of the four Register and 


Arithmetic Logic Units (RALU) is used because 


operation under the control of the compute 
control unit. In addition, the RALU is an 
expensive element, and any reduction in the 
number of units used reduces the cost greatly. 
5. The sequencer is the heart of the Computer 
Control Unit (CCU), and the CCU is the heart of 
the CPU; therefore the sequencer and the other 
parts of the CCU (such as the microstore and 


y 2. 


the pipeline register) must ce fully implemented 


the technigue used in the CCU to control various phases of 
the computer operation. The main techniques used in the 
control of the computer are ring counters, random logic, 


and. microprogramming. In this vaper the computer was 
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igned around a microprogrammed CCU. In conjunction 
with the technique of microprogramming, a pivelined 


architecture was. adopted to increase the computer speed. 


1.1 HARDWARE ADVANTAGES AND DISADVANTAGES 


The choice of the ‘hardwars and the architecture can 
make the difference between a successful anda disastrous 
designe Since a computer built up from SSI, MSI, and LSI 
components is much more expensive to build in terms of 
both time and hardware costs than the ubiquitous LSI 
microprocessor, it is imperative that such a computer have 
appreciably higher performance and flexibility than the 
LSI microprocessor. A typical computer built with bit 
slice techniques would require between fifty and one 
hundred integrated circuits just for its CPU. The cost oi 
components for such a bit slice CPU starts at five hundred 
dollars, as opposed to ten dollars for the LSI micro- 
processor. It becomes obvious that high hardware costs 
for a bit slice computer are a definite disadvantage and 
that there must be performance gains to offset this 
disadvantage if the bit slice aporoach is to be used. 


This of course assumes that performance is needed in the 


Can the bit slice approach provide the necessary 
performance? One aspect of performance is the speed of - 
the technology being used.s As can be seen from figure 1.1 
a bit slice computer using a bipolar technology such as 
Schottky, low powered Schottsy, or aCL would provide the 


kind of speed that is necessary. 


SCHOTTKY 


PMOS 


Typical Power Dissipation per Gate ( in mW ) 


LOW POWER 
SCHOTTKY 


Ins 10ns 100ns 


FIGURE 1.1 Comparison of gate delay verus power 
consumption for various technologies. Compiled 


from data books of AMD, Fairchild, anc Motorola. 
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Picking the right integrated.circuit technology is 


obviously no insurance that high performance will be 


reat importance. A well. designed arcnitecture will 
simultaneously vcerform as many computer operations as 
possible. This concurrency is achieved. by using a 
pipelined microprogram architecture. In this type of 
architecture, a wide microvorogram word, usually from 40 to 
60 bits wide, is sent to an equally wide pipeline 
register. This technique allows one microinstruction, 

the one in the pipeline register, to be executed while 
another microinstruction is fetched from the microprogram 
memory (microstore). 

Additional performance is gained from the width of 
the microinstruction. A wide microinstruction can command 
many actions at the same time, increasing the apparent 
speed of the computer. 

A microvrogram aporoach provides another advantage. 
Flexibility is a major strength of micropvrogramming. If 
it is necessary to add or change instructions, the 
microprogram can be easily changed. Most instruction sets 
have many instructions in common, so it may not be 
necessary to change all the microcode. It may be cossible 
to simply change the addresses in. the Mapping PROM for 


many of the instructions. So microprogramming makes the 


design extremely flexible. 


Since the bit slice design has the form of an 
iterative array that can be expanded by adding more cells, 
the bit slice approach allows easy expansion of the 
address bus and the data bus while allowing the rest of 
the CCU to remain substantially the same. The expansion 
ability gives the bit slice approach flexibility through 
modularity. 

The advantages and disadvantages of the concepts 
introduced for the bit slice approach to computer design 


are summarized in table 1.1. 


“TABLE 1.1 


BIT SLICE COMPARISONS 
7 ___ ADVANTAGES __ DISADVANTAGES _ | 


SPEED:. Bipolar COST: Time to 
technology and microprogram 
pipelining 

FLEXIBILITY: Modularity} 
and micro-=- 
programming 

TIME: Reduced hard-= 


ware design 


1.2 FIRMWARE ADVANTAGES AND DISADVANTAGES 

The major advantage of firmware is the flexibility 
gained from the microprogramming technique. The 
disadvantage is the Large amount of time it takes to write 
microcode. The Advanced Micro Devices literature (AMD, 
1977) describing the System 29 microvrogramming 
development system gives the following time and cost 
for microprogramming. For manual microvrogramming "one 
word of microcode per day is allowed on U.S. Government 
contracts. Three to five words of microcode ver day 
appears to be a reasonable standard on commercial projects 
aeee'l, This means that a 1000 word microprogram would | 
take one man-year to accomplish, and even using the System 
29 develonoment system, it would take half a man-year to 
develop 1000 words of microcode. It is clear that the 
cost of microprogramming is a disadvantage. It is also 
evident why only representative instructions were micro- 
coded for this project. 


design comes at the expense of higher firmware cost. 
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After the decision has been made to design the 
computer using the combined techniques of bit slicing, 
pipelining, and microprogramming, there is the problem of 
selecting which of the bit slice families to use. One 
important criterion in selecting a device for a design is 
availability. This involves more than finding out whether 
or not the device is in stock. Availability involves 
consideration of such questions as whether the family 
is available from more than one distributor, and whether it 
is available at a competitive price with good delivery 
time. Without the purchasing power of a company, avail- 
ability takes on a new dimension. Distributors are not 
eager to deal with an individual, especially in the small 
quantities required for a one of a kind graduate project. 
This latter consideration. made the only practical choice of 
a bit slice the 2900 series. This choice, however, is a 
good one even under the normal commercial meaning of 
avialability, as shown below. 

The available bit slices are shown in table 2.1. 

From the table one could pick out the reasons that certain. 
bit slice families. were not chosen. The following sections 
are presented however, to make it clear why certain 
families were not chosen for this project. This is not to 


say that these families are not well designed; actually, 


TABLE 2.1 
BIT SLICE FAMILIES 


2 & =. 2 $b 7 
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Source: Electronic Design, 1977, page 60 
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some of the families are better suited than the 2900 for 


some. applications. 


Without going into too much detail, the decision not 
to use the IMP was based on the technology and not on the 
functional design of the IMP chips. First, the IMP is 
implemented in PMOS which means it is slow, too slow for 
this application. Additionally, it uses multiple power 
supplies, requires TTL level shifters to be TTL compatible, 
and is not second sourced. There the IMP was not thought 


to be suitable for this application. 


2e2 THE MOTOROLA 10800 FAMILY 


Tne 10800, which uses ECL technology, is another 


ct 


family that was not chosen because of the technology. In 
this case the family is fast enough; in fact, it is too 
fast. The speed is accompanied by high power consumption, 
small (800 millivolt) logic swings, and by noise created by 
the fast switching speeds. Further, the 10800 is not TTL 
compatible because of the 800 millivolt logic swing. Asa 
result of all these disavantages the 10800 family was not 
chosen. ECL is the type of technology that is more 


appropriate for high terformance mainframe computers. 


2.3 THE TEXAS INSTRUMENTS FAMILIES 


The two families considered from Texas Instruments 
were implemented with Schottky and integrated injection 
logic (IL) The SN74US481 was the Schottky implementation, 
and the SBPO4OO0 and SBPOMO! were the I°L implementation. 
Neither of the Texas Instruments families were chosen, 
however, each family was rejected for different reasons. 

In both cases the software support is practically non- 
existent, and, as has been pointed out earlier, microcoding 
is time consuming and needs to be done on a micro~- 
programming development system for commercial applications. 

The SBPOLOO and SBPO4O1 were just too slow to be used. 
In fact the shortest microinstruction time was 350 ns and 
the maximum clock frequency was 3.3 MHz. Single chip 16 
bit microprocessors can do as well. 

The SN74S481 is an extremely fast (67ns) and versatile 
integrated circuit, but it does not fit into the architec- 
ture of the computer being designed. If the SN74L5481 were 
used in this architecture its system speed would ce much 
less than the 67ns the instruction time indicates. This 
paradox comes about because of the NOVA architecture. A 
NOVA uses four accumulators in the CPU for working 
registers; the SN74S481, on the other hand, uses 16 
working registers in the main memory. To obtain four 


accumulators, the SN74S481 must locate them in main memory, 


and unless one is willing ba accept the higher price and 
complexity of high speed cache memory one must settle for 
the more realistic 300 ie memory speed. This means that 
the cycle time of the computer designed with the SN74S5481 
is pesca on 300 ns cycle times and not on the 67 us of the 
basic SN74S481. With cycle times approaching 367 ns the 
SN74S481 would appear no better than the SBPOLOO. The 


SN74S481 was not chosen because of this reason. 
ee THE INTEL 3000 


The Intel 3000 series has two problems associated 
with it. First, the slice is only two bits wide, which 
means that it would require twice as many chips as a four 
bit slice to accomplish the same jobe Secondly, the 
sequencer (3001) addresses only 512 words of microprogram 
memory; however what is really difficult to live with is 
the fact that the sequencer can not go from any Location 
in the microprogram store to any arbitrary location. The 
addressing scheme divides the microprogram store into rows 
and columns. The sequencer can only jump to locations in 
-the row or column of the originating microinstruction. 
This was felt to be an unnecessary restriction, and, as a 


result, the Intel 3000 was not chosen. 


The Fairchild Macrologic. family comes in two versions, 
Schottky and CMOS. The CMOS version is too slow for this 
eroject; nowever, the Schottky version is quite good. The 
Schottky version, the 9405, is not quite as complex as the 


2901; for instance, it does not offer the two port RAM so 


ct 


he RALU can not write and read the RAM at the same time 
or.read two RAM locations at the same time. Asa 
consequence, the 9405 comes in a smaller 2h pin package and 
costs less than the 2901 ($12.00 versus $14.70 in 100 
quantity). The complexity of the 2901 allows more to be 
done in a microinstruction; however, it does not have an 
edge in chip count over the 9405. The 9405 does not seem 
to have as many support chips as the 2901, and in this area 


y 


some applications may give the ed in ip count to the 
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2901. On the other hand, both groups of support chips seem 
to be weli thought out, so a determination would have to 
wait for a preliminary design with coth families. 

On a technical basis the tradeoffs between the 9400 
series and the 2900 series would make the choice a 
difficult one; in fact, for this application Fairchild has 
implemented a NOVA 3% emulation with the 9400 series. Even 
on a commercial availability basis the 9400 is acceptable. 
While it. is true that the 2900 has more second sources than 


the 9400, the fact remains the 9400 is second sourced. 


Therefore, the only reason for making the choice of the 
2900 was the lack of availability of the 9400 through the 


low volume distributors that an individual must deal with. 
2.6 MONOLITHIC MEMORIES 6700 


The Monolithic Memories 6700 appears to be the 
forerunner of the 2900 series. As consequence, there is 
nothing the 6700 does that the 2900 cannot do better or 
faster. Even the pinouts are similar so it makes little 


sense to pick the 6700 series. 
2e7 ADVANCED MICRO DEVICES 2900 SERIES 


There were many obvious reasons for selecting nae 
2900 series such as availability (the 2900 has many 
sources, inciuding AMD, Fairchild, Monolithic Memories, 
Motorola, National, Raytheon, Sescosem, and Signetics ), 
Single vower supply, TTL compatibility, and Schottky 
speeds. However, these are the simple and obvious advan- 
tages. The important advantages are less obvious and more 
complicated. 3 

The sequencer provides several of these advantages. 
With the 2909 or 2911 sequencers, any address can be 
reached from any other address in the microstore. The next 


address. can also be reached via a four word micro- 


1h 
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instruction stack. The 2911 is discussed in greater detail 
in the soeeion on hardware implementation, along with the 
other components of the 2900 chin set. 

Another strong point of the 2900 series is the two 
port RAM in the 2901 RALU. With a two port RAM several 
RAM operations can occur during one microcycle. For 
instance, the contents of register A1l4 can be added to the 
contents of. register B3. and loaded back into B3 with a left 
shift, all in one microcycle. The resulting throughput of 
the machine is much greater than its clock rate would 
indicate, 

. The large selection of support chips, such as the 
4M2930 Program Control Unit (PCU) makes the 2900 an 
especially powerful set from a total system point of view. 

All the preceding elements are more throughly 
discussed in the section on hardware implementation. 
However, an area that is not discussed in detail elsewhere 
in the paper out is of upmost importance is the 
microvrogram develovtment system available for the 2900 
series. It is not the only system available ( see table 
2.1 on bit slice families), but it. is coupled with what is 
perhaps the best of all the bit slices. 

Several versions of the development software are 
available. The first version is the AMDASM microcode 
assembler, which is available on national time sharing. 


Later developments are the System 29 and its implementation 


- 


via floppy disk on the Intel 8080 development system. 
Since the System 29 runs under the control of an Am9080, 
its software is compatible with other hardware systems that 
use the 8080 or its derivatives. 

The Advanced Micro. Devices Microprogramming Handbook 
(AMD,1976a) contains the example show as figure 2.1 of the 
use of the AMDASM microcode assembler with AMD's CCU design 
(figure 11 in the Microprogramming Handbook). Additional 
information can be obtained from Advanced Micro Devices or 
Raytheon in handbooks describing AMDASM or RAYASM in more 
detail. 

Figure 2.1 contains several micro programming examples 
done with AMDASM. The examples assume AMD's design for a 
CCU (AMD,1976a). It should be noted that there is a great 
deal of similarity between AMD's CCU design and the CCU 
designed for this paper. 

AMD's ccU uses 26 of the bits in the 64 bit wide 
microinstruction word. Table 2.2 describes the 26 bdits 


and their functions by dividing the microinstruction into 


1? 


; THIS.IS AN AMDASM MICROPROGRAM ASSEMBLY. EXAMPLE: 

> AMOASM REQUIRES TWO PHASES; DEFINITION AND ASSEMBLY. 
. J a. 
FOLLOWING IS THE DEFINITION PHASE’ AND THE DEFINITIONS 
; REFER TO FIGURE 11. 


WORD 64 ; DEFINE A 64 BIT MICROINSTRUCTION 


: THE FIVE MAIN CCU FIELDS ARE AS FOLLOWS: 


: MO -M11: A 12.8IT NUMERICAL FIELD USED TO 

: SUPPLY THE PIPELINE BRANCH ADDRESS 

ag OR COUNTER LOAD VALUE. 

: - M12-M15: THE AM29811 INSTRUCTION 

: MI6—M20: CONOITION CODE TEST SELECT & POLARITY CONTROL 
; M21 : INSTRUCTION REGISTER READ-IN 

: M22-M25: THE AM29803 INSTRUCTION 


; DEFINE THE DEFAULT PIPELINE BRANCH FIELD. 
: IT WILL FORCE THE MICROPROGRAM TO THE HIGHEST 
; MICROPROGRAM MEMORY LOCATION IF LEFT IN DEFAULT FORM. 


NUMB: O&F 52X, 12V%047777 


; DEFINE THE CONOITIONAL TEST SELECT FIELD AND POLARITY CONTROL 
; DEFAULTS ARE: NONINVERTED AND UNCONDITIONAL. 
: TESTS ARE ACTIVE LOW! : 


TEST: DEF  43X,. 4V%:D #0; 1vB#0, 16x 


CNTR: £OuU 15 ; COUNTER ZERO TEST SELECT 
Ss UNV: EQU B#1 7 POLARITY CONTROL 


‘ 


; OEFINE THE AM29811 NEXT AODRESS CONTROL UNIT 
; INSTRUCTION MNEMONICS. 


12x ; UMP ZERO 

12X / CONDITIONAL JUMP SUBROUTINE 

12X ; JUMP MAP 

12X ; CONDITIONAL JUMP PIPELINE 

42X ;PUSH/CONDITIONAL LOAD COUNTER : 
12X ;CONO: JUMP SUBROUTINE REGISTER/PIPELINE 
12X ; CONDITIONAL JUMP VECTOR : 
12X ; CONDITIONAL JUMP REGISTER/PIPELINE 

, 12X ; REPEAT FILE LOOP ON COUNTER .NE. ZERO 


2: . DEF 48x, 
CJS:  OEF —48X, 
IMAP: DEF 48x, 


H 
H 
H 
CUP: OEF 48X, H 
PUSH: — OEF 48X,.H 
H 
H 
H 
H 


JSRP: OEF 48X, 
CJV: . OEF 48X, 
JAP: ‘OEF 48X, 
RECT: OEF 48X, 


Om NAR WN Oo 


THE GH Yet fe de de 4h te 


RPCT “DEF 48X, H#9, (2X. ; REPEAT PIPELINE ON COUNTER .NE. ZERO 
CRIN: O&F 48X, H#A, 12X ; CONDITIONAL RETURN 

CIPP: DEF 48X, H#B, 12X ;CONOITIONAL JUMP PIPELINE & POP 

LOCT: OEF 48X, H#C, 12X 7 LOAD COUNTER & CONTINUE 

LOOP: OEF 48X, HO, 12x ; TEST END LOOP (CONDITIONAL LOOP ON FILE) 
CONT: OEF 48X, HE, 12X 7 CONTINUE 

JP: DEF 48X, H#F, 12X° -; JUMP PIPELINE 


PICURE 2.1 An example of AMDASM microcode assembly 
: AMD, 1976a vage 1-16 
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, 


; THE DEFAULT. FOR DATA BUS READ-IN OF INSTRUCTION REGISTER 1S DISABLE 


pe: BEF 42X, 1VB#I, 21X 
IN: EQU B#0 


; DEFINE THE AM29803 16-WAY BRANCH CONTROL UNIT 
; INSTRUCTION MNEMONICS, 


NOT: . OEF 38X, 


H#=0, 22X 
TO: OEF: 38X, H#t, 22X. 
Ti: ~ DEF 38X, H¥2, 22X 
TOI: OEF 38X, H#3, 22X 
T2: OEF. 38X, H#4, 22X 
702: O€F 38X, H=5, 22X 
Ti2: DEF 38X, HS6, 22X 
7O12:.  OE&F 38X, H#7, 22X 
T3: OEF 38X, H=8,. 22X 
703: OEF 38X, H#9, 22X 
T13: OEF: 38X, H#A, 22X 
TOQI3: DEF 38X, H#B8, 22X 
723: DEF * 38X, H#C, 22X 
7023: OEF 38X, HO, 22X 


1123: DEF 38X, H#E) 22X 
70123: EF 38X, H#F, 22X 


END END OF QEFINITION PHASE 


« BEGIN ASSEMBLY PHASE 


FIGURE 2.1 continued 


From: AMD, 1976a page 1-17 


+ GATHERING AND MANIPULATING DATA. PART OF THIS ODATA ARRIVES 

| IN'B-BIT BYTES SO SWAPPING IS NECESSARY. ALSO, THERE ARE 

TWO CONTROL SIGNALS WHICH REQUIRE IMMEDIATE ATTENTION 

; WHEN. ACTIVE. ASSUME THAT THESE CONTROL SIGNALS ARE CONNECTED 

; 70 T2 AND T3 OF THE AM29803.16-WAY BRANCH CONTROL UNIT, FOLLOWING 
: 1S THE AMOASM OUTPUT FOR THIS EXAMPLE'S ASSEMBLY PHASE; 

; WHICH INCLUOES THE SOURCE LISTING ANO OUTPUT BIT PATTERN. 

; IN THIS EXAMPLE, THE MICROPROGRAM STARTS AT LOCATION 


Parga? 

| * EXAMPLE 1. 

: ; VISUALIZE A 16-BIT PROCESSOR IN A REAL-TIME ENVIRONMENT 
| 

1 

| 

| 


: 0360 OCTAL. AS MENTIONED EARLIER, THE ALU PORTION OF 
THESE EXAMPLES IS NOT DEALT WITH. 


, 


0001 
9002 
9003 
0004 


0005: 
9008 


0007 
0008 


0009 
0019 


‘0011 


OOFO 
OOF 
OOF2 
OOF4 


OOFE. 


ORG H#OFO 
SWAP: 


NUMB 0006* & TEST , & PCLC & T0123 


RECT & TEST CNTR , & T0123 
CJV & TEST , & T0123 


ORG H#OF4 
ORTEST2: 


ORG H#OF8 
ORTEST3: 


ORG HHOFC 
ORTEST23: 


ENO 


XXXKXAXXXXXKXKKXXKK 
XXKXXAXXXXKXXKXXXX 
XXXXXXXKKXXXXKAX 
KXXXXXXXKXXXXKKXKX 
XXXXXXXXKAKXXXXX 


QOFC XXXXXXXXXXXXKXXK 


XXXXXXXXXXKKXXXKXK 
XXXXXXXXXXXXXXXK 
XXXXKXXKXXXXXKXXKX 
XXXXXXXAKKXXKXKXK 
XXXXXXXXKXXXKKXKXK 
XXXXXXXXXXKXXKXX 


TEST , & JPL & NUMB H#1FO ;#2 HANOLER AT LOCATION 1FO 


TEST , & JPL & NUMB H#2FO ; #3 HANDLER AT LOCATION 2F0 


XXXXXX1111X00000 
XXXXXXVIVPXVWIG 
XXXXXX1111X00000 
XXXXXXXXXXX00000 
XXXXXXXXXXX00009 
XXXXXXXXXXX00000 


TEST , & JPL & NUMB H3F0 ;#2 AND#3 HANOLER AT LOC 3F0 


O100731711713001 


WOOOXXKXXXXKXXXXX 
OTIOXXXXXKXKKKXK 
1111000111310000 
1111001011110000 
W1W1900T111110000 


nano ene nN Le AP 


FIGURE 2.1 


From: 


AMD, 
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; EXAMPLE 2. 


+ ALIGNMENT CAN BE REALIZED IN ONE MICROINSTRUCTION: ASSUME 
> THAT. F3 OF THE MOST SIGNIFICANT ALU SLICE IS CONNECTED TO 

; TEST 13 OF THE CONDITION MULTIPLEXERS. NOTE THAT NEGATIVE 

> NUMBERS CAN BE ALIGNED IN THE SAME MANNER BY SIMPLY 

; OMITTING THE VARIABLE “INV”, ALSO, IF THE COUNTER IS CLEARED 
; BEFORE STARTING ALIGNMENT, IT WILL CONTAIN THE NUMBER OF 

; SHIFTS REQUIRED TO 00 THE ALIGNMENT {OR THE COMPLIMENT 

; IF USING AM25LS165 COUNTERS). 


0001 
0g02 
9003 


ORG 00770 
ALIGN: 
ENO 


NUMB 0770 & TEST 13, INV & RPCT 


(ALU TO SHIFT UP} 


QUFB XXXXXXXXXXXXXXXKX XXXXXXAXXXXXXXXAX KXAXXXAXXAXXIIOIT MWdGG11111000 


: EXAMPLE 3. 


; TEST~12 (AND F3 TO TEST-13 AS BEFORE), AND SIXTEEN 
; OIVISION STEPS ARE REQUIRED. IF THE FINAL. REMAINDER IS NEGATIVE, JT MUST BE 


tA DIVISION ROUTINE. ASSUME F = 0 OF THE ALU IS CONNECTED TO 


. ; RESTORED BY ADDING IT TO THE OIVISOR. THE VECTOR INPUT IS SET UP 


; FOR THE ERROR ROUTINE. NOTE USAGE OF THE AMOASM CONVENTION 


acne notte etapa rE Zee a ee 


FIGURE 2.1 


From: 


ORG 071000 
OIVIOE: 


TEST, & JMAP 
TEST, & JMAP 
END - 


KXXXAXXXXXKXXXXKKX 
XXXXXXKKXXXKXKXXKKXK 
XXXXXKXXKXXXKXKAXXKX 


KXXXXXXXXKAXXKXAXX 


XXXXXXXKXKXXXXKXXX 
XXXXXXXXXKXKKX KKK 


AMD, 


LOCT & TEST, INV & NUMB°O714%» 
TEST 12, INV & CJV ‘ 
* RPCT & TEST CNTR, & NUMBS 
TEST 13, INV & NUMB $+2-& CJP 


>"$" TO DENOTE THE CURRENT PROGRAM COUNTER. 


;(ALU OUTPUTS OIVISOR) 

;1F =0: 
; LOOP 
HIF OR <0, CORRECT 


ERROR 


; EXIT TO MAP 
;ALU ADOS REMAINDER TO -DIVISOR, EXIT MAP - 


XXXXXXXXXXXXXXXX 
XXXXXXXXXXXKXXAXKX 
XXXKKXXXKXXKXXAXXKKX 
XXXXXXXXXXKXXXXXX 
XXXKXXXXKXAXKXXXKXK 


XXAXKKXKKXKKAXKAKKXK 
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XXXXXXXXXXX00001 
XXXXXXXKXXX11001 
XXXXXXAXAXAXXKII110 
XXXXXXXXXKXKt1911 
XXXXXXXXXXX00000 
XXXXXXXXXXX00000 


WWTP 1Tt1190d0t 
OVIOXKXXKAXKKKXXX 
1001001000000010 
0011001000000101 
OOIOXXXXXXXKXXXX 
DOTOX XXXXKKKXAKAK 


a epee EN RA pe ARE TEE et OLN CR ITI RA AA 


fo 


TABLi 2.2 


CCU ASSEMBLY DEFINITIONS 


[21 | e-20 | 2-15 | ot 
Register and Polarity (Instructions | Numerical Field* 
The Test Any 4 
Number Digit (12 
{1-14) in Bit) Octal 
Decimal, “Number 
and: 

‘CNTR 

for Test 
Select. 
{Uncondi- 
tional by 
default) INV 
for Test 
Polarity 
{noninverted 
by default} 


Am29803 
Instructions 


Field Description 
Parameters © 
To 8e 
Used 


Source: AMD, 1976a,; page 1-15 


The computer design in this paper incorporates the 


ctr 


echniques of microprogramming and pipelining. A basic 
functional block diagram of this tyve of system is show 
in figure 3.1. The functional blocks perform various 
phases of the computer's operation and are listed below 
along with a brief description of the function of each. A 
more detailed presentation of each element is given in the 
section on the hardware implementation. This section is a 
simple overview to familiarize the reader with the total 


architecture. 


3.1 INSTRUCTION REGISTER 


The instruction is clocked into the instruction 
register from the data ous when the poipeline register 
sends the proper command. The instruction is held in the 
instruction register (IR) until the pipeline registe 
commands another IR load. From the instruction register 
the instruction is routed to the Mapping PROM; RALU, PCU, 


and the input and output (I/0) control. 


3.2 MAPPING PROM 


The Mapping PROM contains the address to the starting 


DATA BUS 


' 


MAP PROM 


SEQU ENC ER 
CONTROL 


SEQUENCE 


PIPELINE | 


\ 


aR 
mn wn 
oOo _- 


fates inene unt, aarnineree ne nenteied nein i Aaland oleae ene te apie camts oneaernnsAnanama gine ene neeinnpimaaitntiateenen rte dere threaten mnt nti enna rm 


i TRI- 
STATE | 
a UFFER 


FIGURE 3.1 Functional block diagram of microprogram 


type computer. 


points of each instruction's microcode in the microstore. 
By routing the instruction from the instruction register 
to the Mapping PROM, the correct starting address for the 


instruction's block of microcode is generated. 
3.3 SEQUENCER 


The. sequencer selects the proper source for the next 
microinstruction address. The next address may come from 
the Mapping PROM, the pipeline register, the sequencer's 
stack, or the sequencer's R register. The source for the 
next instruction depends upon the situation, as will be 


seen later in the examples of microcode. 
3.4 MICROPROGRAM 


The outvut of the sequencer is sent to the micro~ 
program memory and the output of the microprogram memory 
is clocked into the pipeline register on the next clock 
pulse. Once in the pipeline register, the instruction is 
executed by sending commands to the RALU, PCU, MAR, MBR, 
sequencer, and other computer elements. The microprogram 
sends the sequencer the code for the next address source 
and may also send the branch address if the situation is 
called for. This process allows the previous piveline 


word to fetch the next microcode while the present code is 


being executed. This varallel operation allows the 
computer to run twice as fast as would be possible if the 


system was crocessing the microcode serially. Figure 3.2 


illustrates how this overlapping operates. 


EFFECTIVE Dl 
£ m 


URATION OF EACH ADD 
PION = 3 MICRO cYCL 


ay 
Ut 


FETCH 
INSTRUCTION 
DECODE 
INSTRUCTION 
FETCH BASE 
ADDRESS 

FORM EFFECTIVE 
ADDRESS 


rt OPERA 
AND SAVE RESULT 


ACTUAL DURATION OF ACH 
ADD ‘DX. INSTRUCTION =. 6 


and the effective increase in throughput. The 
instruction is a direct-indexed- addition. 


From: Muething, 1976 
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The pipeline register is loaded from the microprogram 
and is usually between 40 and 60 bits wide. These bits 
are formed into many different fields which control the 
different elements of the computer. For instance, the 
oipeline register has fields to control the next 
microinstruction address sources, the function of the ALU, 
the operation of the I/O unit, the program addressing, and 


all the associated registers. 
3.6 RALU 


The 2901 contains 16 two port registers, a Q@ register, 
an 8 function ALU, output multiplexer, and shifting 
capability (the more complex 2903 also has provisions 


to do two's complement and floating point operations). 
3.7 PROGRAM CONTROL UNIT (PCU) 


The PCU is a powerful integrated. circuit that contains 
a stack, incrementer, and other necessary elements to do 
most forms of addressing, including direct, indirect, 


indexed, and relative. 


4.0 . SELECTION OF THE INSTRUCTION 5! 


Deciding on which instruction set to use in the 
computer design was the most agonizing vart of the design. 
The goal was an instruction set that was both easy to 
implement and had a large existing software base. The 
cnoice soon narrowed down to either tne PDP-11 or NOVA 
1200 instruction set. 

The PDP-11 instruction set (figure 4.1) has one of the 
largest bases of existing software. In addition, Digital 
Equipment Corporation and Western Digital produce a PDP~11 
software compatible microprocessor. The availability of 
the microprocessors would allow the development of software 
on a relatively large and fast system, such as the Am2900, 
and allow total software transfer to a small dedicated 
system later if desired. However, the PDP-1!1 instruction 
set is particularly difficult to implement because of its 
poorly structured op code field. This op code field 
difficultly would have required expensive Programmable 
Logic Arrays. (PLA) to implement the instruction set. Since 
the PLA's are not reprogrammable, this would nave been cost 
vrohipitive on a one of a kind system. 

As can ve seen from figure 4.2, the NOVA instruction 
set is simpler and more clearly structured than the PDP-11 


instruction set. There are separate fields for the ALU 
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(CLR,CLRB,COM -COMBINCINCB, 
DEC.DECBNEGNEGBADC,ADCB, 
SBCSBCB,TST.TSTBRORRORB, 
ROL,ROLB.ASR,ASRBASLASLB IMP, 
SWABMFPS,MTPS SXT.XOR) 


ae OP Code DD © 
15 . 6 35 0 


2. Double Operand Group (BIT.BITB SIC BICB BIS BISBADD, 
SUB MOV.MOVB,CMPCMPB ) 


S 0 


Iss, Aeat 6 


\. Singie Operand Group 


a. Branch (all branch instructions) 


dc 


15 8 7 
b. Jump To Subroutine (ISR) 


c. Subroutine Retumm (RTS) 
0 9 0 2 0 R 


. d. Traps (break point, JOTEMT,TRAPBPT) 
OP CODE 


e. Mark (MARK) 


f. Subtract 1 and branch (if = 0) (SOB) 
4. Operate Group (HALT,WAIT.RTIRESETRTTNOP) ! 


OP CODE 


5. Condition Code Operators {all condition code instructions) 


6. Fixed and Floating Point Arithmetic (optional E1S/FIS(FADD, 
; FSUB FMULFDIV MUL, 
DIV,ASH ASHC) 


£e OP CODE | R | 


PIGURE 4.1. PDP=11° instruction set 
From: DEC, page 33. 
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9440. INSTRUCTIONS 


ADDRES SMG 
PAGE ZERO 
PC RELATIVE 

INDEXED Wid ACZ 
UIDEXED VIA ACI 


imeOIRECT 


a OIRECT 


INDIRECT 
MEMORY REFERENCE 


\ WITHOUT 
j REGISTZR 


Memory Reference instructions without register are used for branching (JMP, JSR) without involving accumulators. 
These instructions.are also used for madifying memory (1SZ, DSZ). Memory Reference instructions with register are used 
to move 16-bit words between the memory and the accumulators. 


00 NOTHING 

SKIP ALWAYS 

SKIP ON ZERO CARAY 

SXiP ON MON-ZERO CARRY 

SKI® ON ZERO AESULT 

SKIP ON NON-ZENO HRESULT 

SKIP OF EITRER CAPAY OW RESULT ZE90 
SKIP 1F BOTH C4ARY AND RESULT NON-ZERO 


wawuncese 
asn8o44on 


FUNCTION: - 7 


oto COMPLEMENT : 
aio NEGATE SHIFT CODE CARRY CODE 

edi orcne 1O'N. CURAENT Caray rr 

ops INCREMENT DO NCIHING ai r 

tto AOD COMPLEMENT ROTATE LEFT ONCE ZERO SCAG SHIGE 

rio SUBTRACT ROTATE RIGHT ONCE ONE @ | LOAD RESULT IN OST. AG 

sts aoD BYTE SWAP COMPLEMENT CURRENT CAARY 1 | OD NOT LOAD RESULT IN-OST AG 
fa AND. 


- Arithmetic/Logic instructions perform arithmetic({ ADD, AOC, INC, NEG. SUB) or Soolean{ AND. COM, MOV} operations 
on the contents of two registers. The resuit of each operation together with the Carry bit can be rotated and tested for skip 
conditions as part of the same arithmetic/togic instruction, loading in the destination register is optional. 


DEVICE COOE 
USED TO SELECT ONE OF 44 OEVICES 


CONTROL 


00 NOTHING 
STANT $y DEVICE 
CLEAROLE 110 DEVICE 
PULSESPECIAL FUNCTION 


FUNCTION 


NO VO TRANSFER 
CATAINA 

CATA OUT A 
DATA ING 

DATA OUTS 
DATAING 

DATA OUTS 

SKIP ON 6USY Of DONE 


wu aanstoe 
woeosro0 


: 
: 
| 
! 
ie 


Input/Output instructions move data between the 9440 accumulators and three buffers in the peripheral davice interface. 
These instructions also perform controt functions in the I/O device and test the status flags in botnthe peripheral ciscuitry 
and the centrai processor. 


PIGURE 4.2 Fairchild 9440 emulation of the NOVA 1200 


instruction set 


On 


From: Wilnai, 1977, page 


function, shift, and carry. The op.code field does not 
have a variety of lengths as the PDP-11 does. This makes 
the implementation of the Mapping PRCM easy since EPROM 
can oe used. here is the added plus of a large and 
expanding software base for the NOVA instruction set, and, 
as in the case of the PDP-11, there are microprocessors 
available that are compatible with the NOVA instruction 
set. One microprocessor is from Data General (the MN601) 
and the other is from Fairchild (the 9940). 

It should be noted that the NOVA instruction set is 
not a particularly demanding set for the Am2900. For 
instance, the 2900 can easily provide information on 
comparisons such as A greater than B, or A less than B; 
the NOVA instruction set can not use this information and 
must take several steps to arrive at the same decision. 

The NOVA instruction set has access to the four 
accumulators which are present in the NOVA architecture, 
however, since the Am2901 has sixteen registers, twelve of 
which the NOVA cannot use, the NOVA instruction set cannot 
cake full advantage of the Am2901's capability. 

When all the factors are considered, the NOVA 
instruction set is an acceptable choice for. this: project 
because it has adequate power and a straight forward 
structure for its instruction set. Figure 4.2 and figure 
4.3 show the structure of the instruction set and the 


function of each instruction. 


Mnemonic Octal : Operation 


coce 2 
Memory reference instructions 


OSZ 014000° Decrement. location €' by 1 and skip 
if result is zero. 
i$Z . 010000 Increment !acation E by 1 and_skip it 


resuit is zero. 
JMP 000000 Jump to location E 


JSR 004600 Load PC: in AC3 and jump to subrou- 
tine at location E . | 
LOA 920090 - Load contents of location E into Ac j 
| STA G40000 Store AG in ocation E 


Arithmetic and logical instrucbons 


102000 Add the complement of ACS? to ACD? 
103000 Add ACS to ACD 
103400. AND ACD with ACD 

100000 Place the complement of ACS in ACD 
104400 . Place ACS+1 in ACD 

101000 Move ACS to ACD 

100400. Place regative of ACS:‘in ACD 

102400 Subtract ACS from ACD 


073101 If overflow, set Carry. Otherwise divide 
ACO-AC1 by AC2: Put quotient in 
AGi, remainder in ACO. 


073301 Muitipfy AC1t by AC2, add product to 
ACO, put result in ACQ-AC1 


Input/output instructions 


60400 Oata in, A buffer to AC 
061400 Datain, 8 buffer to AC 
062400 Data in. C buffer to AC 
061000 Oata.out, AC to A butter 
062000 Oata out, AC to 8 buffer 
063000. Data out, AC to C buffer 
Q600G0 No operation 

063400 . Skip if Busy is 1 

063500 . Skip if Busy is 0 

963600 Sia if Done is 1 

963700 Skip if Done is 0 


Stack maniouiation instructions 


060201 Move contents of frame pointer to AG 
961201 Mave contents of stack pointer to AC 
060001 Move contents cf AC to frame pointer 
961001 Move-contents of AC ta stack pointer 


061601 Move top word on stack to AC and 
decrement stack pointer 

061401 increment stack ovinter and move con- 
tents of AC to top of stack 

062601 Restore accumulators, program counter 
and carry. from last feturn dfock 
on stack 

062401 Push a five-word return block on stack 

062077 Set up interruot-disabie flags accora- 
ing to mas« in AC 

071077 Enable interrupts from CPU reat-time 
clock 

065077 Oisadle interruots from CPU real-time 
clock 

100010 . Software interrupt (ALC format no-skip, 
no-f9ad) 


Central processor controt instructions 

HALT 063077 Halt the processor 

INTA 061477 Acknowledge interrupt by loading cade of 
nearest device inat 1s requesiing an: inter- 
fuct into.AC bits 10 to [5 

INTOS 060277 Oisable interrupt by cieanng interrupt ON © 

INTEN 060177 = Enabte interrupt by setting ‘nterrupt ON 

{ORST 061077. Clear all 1/0 devices 


+e. toeation €" Germns in a incatan wih ae sadtess TOMDUIES usetg 8 S 9 1S 
af ine ward and einer the PG. ACZ of ACI. 


22S. and ACM reler 10 s0UICe and GaBlnatOR aCcuMurators, each cel@eted by 
8 2-01t section of Ihe miiruchon 


FIGURE 4.3 microNOVA instruction set 
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5.0 STATE TRANSITIO 


N_ DIAGRAM 


The state transition diagram-depicts, in graphical 
form, the sequence of events that the computer can go 
through in each cycle. The state transition diagram, 
shown as figure 5.1, is presented in the form typical of 


other state diagrams and should be self explanatory. 
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RESET POWER UP 


HALT 


FETCH INSTRUCTION 


MEMORY REFERENCE 


Y 
PASE. PAGE PO RELATIVE VIA ACZ NIRCACS 


ARI TH/LO6I CAL. ; To, $TAcK, 
EXECUTE PROCESS CONTROL 
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FETCH OATA 
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y 


EXECUTE 
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ransition diagram 


-O HARDWARE IMPLEMENTATION 


On 


a) 


Figure 6.0 shows the system design of the 2900 16 bit 
minicomputer. Figure 6.0 refers to. the detailed drawings 
(figure 6.1 through 6.12) of the computer functions. The 
following sections describe the hardware implementation of 


these functions in dstail. 
6.1 IR AND MAPPING PROM 


The computer starts its cycle by loading the instruc- 
tion register (figure 6.1 in the system design). The 
instruction register latches the data and sends it on to 
the Mapping PROM. In a final design the Mapping PROM 
would consist of high speed bipolar PROM, however, for the 
development system relatively slow ultra violet erasable 
EPROM was used because it provided more flexibility and 
was more cost effective than throwing away a set of PRCMs 
each time the microcode was changed. This necessarily 
slowed down the speed of the computer since the EPROMs 
used (the 2708) have access times of 450 ns as compared 
to 50 ns which is typical of microstore speeds. 

The Mapping PROM sends the proper starting address to 
the sequencer( figure 6.2 in the system design) where one 


of several options will be performed depending on the 


or instance, a micro- 
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FIGURE 6.0 System architecture for 16 bit computer CPU ‘ 
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FIGURE 6.1 Instruction register and Mapping PROM 
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FIGURE 6.4. 2901 RALU array 
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FIGURE 6.5 Arithmetic and logic unit 
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FIGURE 6.7 Memory Address Register 
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FIGURE 6.8 Program Control Unit 
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FIGURE 6.10 Microstore and Pipeline Register 
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FIGURE 6.11a Physical layout of CCU 
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FIGURE 6.11b Photograph of the physical layout of the CCU 
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FIGURE 6.12 MODE implementation circuit 
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subroutine may be called or returned from, or a branch to a 


Location selected by the pipeline register may be executed, 
or a branch to the Location addressed by the mapping PROM 
may be executed. The appropriate action depends upon the 
next address and mode fields of the pipeline register. 

The next address field controls the operation of the 
Am29811. sequencer controller, which will be discussed in 
detail latter. 

What is important about the 4m29811 now is the fact 
that it does not allow for a jump toa subroutine given by 
the address from the mapping PROM. There are many 
variations on the basic NOVA 1200 instructions including 
direct, indirect, relative, and base page addressing 
combined with the shifts and skips that are a part of the 
ADD instructione To decode these instructions into unique 
microcode routines requires either a large amount of 
memory, or a large amount of time, or a clever compromise 
to keep the amount of mapping PROM and microstore down 
while not slowing the computer down too much. 

One of the options, the most memory intensive and the 
rastest, is. to.microcode each instruction permutation 
and map each variation into its unique block of microcode. 
This approach requires a much larger mapping PROM and 
microstore, but the computer does not waste time decoding 
instruction permutations with firmware. 


Another method is to determine each vermutation by 
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jumping into microcode subroutines to determine the prover 
mode and returning when the determination is complete. 
This method is slow because of the time required to jump 
to a microcode subroutine, do a firmware determination of 
mode, and to return from the subroutine. 

A more practical approach is to use the Mapping PROM 
asa decoder. To do this, the Mapping PROM must be 
available as a mapping function during normal modes, but 
must be converted into an instruction decoder during the 
appropriate modes. In this way, the computer goes to the 
proper instruction microcode without the use of firmware 
subroutines. Since there are several mode types, such as 
addressing, shifting, and skipping, and since the Am29811 
does not have a MAP subroutine instruction, logic has to 
be added to accomplish calls to the subroutine pointed to 
by the Mapping PROM. 

The way to implement the MAP subroutine instruction 
with the least amount of hardware would be to combine the 
next address and mode fields into one field and replace 
the Am29$11 with a PROM large enough to decode the combined 
fields. However, the need for the mode field was not 
realized until after the hardware was constructed. 
Besides, I did not have access to the necessary PROM 
programmer. The next best approach was to construct the 
logic needed by using a 7418251 tri-state eight input 


multiplexer as an universal logic module in combination 
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PF 
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Ph 


with the unused half of a 74LS139 one of four decoder. 


During the addressing, shifting, and skipping modes the 


logic inverts the pipeline and mapping PROM enable lines 


so the jump to a pipeline subroutine is converted into a 
jump to the MAP subroutine. The circuit is show as figure 


5e 


On 
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6+2 SEQUENCER 


The sequencer is the heart of the computer's micro- 
programmed architecture. The sequencer controls the 
execution of the microprogram with its own instruction set 
(see table 6.1), but the sequencer is, in turn, controlled 
by the sequencer controller, a 29811. 

The sequencer controller is vart of the next address 
circuit (see figure 6.3 ). The next address circuit 
performs several functions. First, the next address 
circuit can test up to 16 test inputs and send the results 
to the 29811 sequencer controller which uses the 
information to decide the source of the next sequencer 
address. Secondly, in addition to the test input, the 
next address source for the sequencer is determined by 
@ Pipeline register commands to the sequencer Sonieor er, 


h 
These pipeline register commands are from the sequencer 


(e) 
(e) 
Bb 
ct 
a 


oller's owm instruction set (see tables 6.2, 6.3, 
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TABLE. 6.1 
SEQUENCER COMMANDS 


Address Selection Output Control 


SOURCE FOR Y OUTPUTS [SYMBOL 


Microprogram Counter 
Register 
Push-Pop stack 


Direct inputs Source selected by Sg Sy 


2 4 High Impedance 


No change 

Incremeng stack pointer, then 
4 = High push current PC onto STKO. 
L = tow 
X% = Oan't Cars 


Pop stack (decrement stack pointer} 


uve | REG | STKO | STK1} STK2 = ele 


PRINCIPLE 


COMMENT USE 


Pop Stack 


é Set-up 
bee: aii 
Pop Stack; End , 
Use AR for Address Loop - 


Push uPC; 
Jume to Address in AR 


jumo- to Address in STKO; 
Pap Stack 


Jump to Address in STKQ; 
Push uPC 


Stack Ref 


Jump to Addrass in STKO {Loop} 


Pop Stack; 
Jump to Address on 0 


Jump to Address on O; 
Push uPC 


X = Dan't care, Q = LOW, 1 = HIGH, Assume Cy > HIGH 
Nota: STXO is the location addressed by the stack pointer, 


Source: AMD, 1976a, vage 2-6 
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“TABLE 6.2 


Am29811 INSTRUCTION SET 


=x 
rc 


= 
= 


Trtrrrtirr 
Zepdereier 


INSTRUCTION 
Jump to.Address Zero : 
Conditional Jump-to-Subroutine with Jump Address 
in Pipeline Register. 
Jump to Address at Mapping PROM Output. 
Connitional Jump to Address in Pipeline Register 
Push Stack and Conditionally Load Counter 
Jump-to-Subroutine with Starting Address Conditionaily 
Selected from Am2911 A-Register or Pipeline Register. 
Conditional Jump to Vector Aderess. 
Jump to Address Conditionally Selected from Am2911 
R-Register or Pipeline Register. 
Repeat Loop if Counter is not Equal to Zero. 
Repeat Pipeline Address if Counter is not Equal to Zero. 
Conditional Return-from-Subroutine. 
Conditional Jump to Pipeline Address and Pop Stack. 
Load Counter and Continue. 
Test End of Loap. 
Continue to Next Address. 
Jump to Pipeline Register Address. 


AMD, 1976a, page 2-19 


TABLE 6.3 


Am29811 FUNCTION TABLE 


iwPUTS 
INSTRUCTION TEST | NEXT AODR 
¢EMON 
MNEMONIG | 1 2 1h 10 FUNCTION inpuT | sOuaAcE 


a aa ee JUMP ZERO 


coo. : 


MAP tue L 


PUSH Co wk OL PUSHICOND LO CNTR 
ISRP 


=z r(x a xe 


| 


asi 


REPEAT LOOP, CNTR +0 


REPEAT PL, CNTR #0 


COND RTN 
COND JUMP PL & POP 


b LOAU CNTR & CONTINUE 


& 


HOH CONTINUE 


JUMP PL 


reece eee ; 


L»= LOW DEC = Decrement 
H = HIGH : *LL = Special Case 
X = Don’t Care 


Source: AMD, 1976a, page 2-20 
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TABLE 6.4 
Am29811 TRUTH TABLE 


INPUTS 


‘ NEXT 
MNEMONIC FUNCTION : ADOR 


SOURCE 


JUMP ZERO 


clap 


COND JUMP PL 


PUSHIGOND LO CNTR 


JSRP CONO JS3 R/PL 


c 


L 
L 
H 
H 
H 
al 
L 
L 
u 


cer 
eer 


r 


peer to 
civ CONO JUMP VECTOR 


u 

H 

H 

5 

L 

H 

H 

& 

L 

H 

& x 

& u 

L L 

iAP COND JUMP R/PL L 4 
ell eee ee 

aee 

& 

iH 

H 

& 

H ‘5 
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H a 

A 

L 

H 

bal 
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x xizrx 


REPEAT LOOP, CTR + 0 


Lal 
APCT REPEAT PL, CTR #0 H 
H 
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je xfe efx ele e{e xf[e xine 


CRIN wn 
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ecle efe cle efx x[e xix zie xf ele cfr 


LO CNTR & CONTINUE H OR 
: Lad HO 
TEST ENO LOOP 


COM Tisswe t a 
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The result is rather like pulling: yourself up by your 
poot straps. The sequencer is the neart of the micro~ 
programmed architecture, but it is. controlled by the next 
address circuit. The next address circuit is controlled by 
the pipeline register, which is controlled by the 
microstore.s. Finally, the microstore is controlled by the 


sequencer and the circle is complete. 
6.43 MICROPROGRAM MEMORY AND THE PIPELINE REGISTER 


Once the proper address is chosen for the microprogram 


memory (the source of the address may be the mapping PROM, 


the pipeline register, or the sequencer), the address is 
transmitted to the microprogram memory's address lines. 
For the development system the microprogram memory will be 
ultra violet erasable PROM and will require 450 ns before 
the data appears at the output of the microprogram memory. 
When the data apvears at the output of the micro- 
program it is loaded into the pipeline register. The 
function of the pipeline register can be seen as a latch 
into which the microprogram word is stored. While the 
microprogram word is stored in this latch two actions take 
place. First the current microprogram word, the one in the 
pipeline register, is executed. At the same time, the 
sequencer is instructed to fetch the next microvrogram 


word. By the time the current microbvrogram is finished 


AJ 
ON 


executing, the next microinstruction will be waiting at the 
input to the pipeline register. 

As can be seen from the computer design (figure 6.1), 
the microprogram word is rather long. The microprogram 
word is composed of the following major fields: 

1. Test Condition Fleld - This field contains the 

code that selects one of the 16 test inguts. 

2. Polarity Field - This field determines the 

polarity of the test input chosen. 

3. IR Field - This field is used to load the 

instruction register. 

4. Microprogram Branch Field - This field provides 

the next address for a microprogram branch. 

5. ALU Source Field - This field selects two data 

sources for the ALU function from among the 16 
RALU registers, Q@ register, data inputs, and Zero. 


ALU Destination FPield - This field selects one of 


OV 
e 


the RALU's 16 general 
destination of the results of the current ALU 
operation. 
7. ALU Function Fleld - This field selects one of the 
eight ALU functions to operate on the source data. 
8. SALU Field - This field enables the ALU outputs. 
9. Shift. Field - This field determines how the shift 
be performed in the ALU. 


10. ALU Carry Field - This field determines how the 


carry will. be used in the AL 

11. PCU Carry Field - This field determines how the 
carry will be used in the program control unit. 

12. -PCU. Address Field ~ This field determines the 
function of the PCU. 

13. E Field - This field enables the output of the 
PCU to the MAR. 

14. MODE Field - The mode field determines the mode of 
the Mapping PROM. It specifies whether the 
Mapping PROM is to be used as a mapping function 


or as one of the decoding functions. 
604 RALU 


The output from the various fields of the pipeline 
register drives the functional blocks of the computer. One 
of the blocks is the 2901 RALU which voerforms all the 
arithmetrical and logical operations required by the 
computer. The RALU contain 16 general purpose registers 
(the registers are implemented with dual port RAM), a Q 
register (for temporary storage such as required in multi- 
plication), an eight function ALU, and data routing 

ircuits. The interaction cof the various RALU components 
is controlled by the contents of the ALU source, 
destination, and function fields of the pipeline register. 


Table 6.5 contains the various RALU functions that are 


TABLE 6.5 - 
ALU OPERATION 


MICRO CODE “OPERANDS. 
Gewi 
Code 


xExxrerere 
Rzrerxsice 
xexrrzrere 
OQG0000>>» 
Ruxterer 
garrenicre 
xexerxrecker 
NONnanwo 


oOoOrraneno 


ALU Source Operand Cantral. Cone ALU Function Control. 


AM GREG. RAM a 
MICRO CODE FUNCTION FUNCTION ¥ SHIFTER | SHIFTER 


X* Gon’t care, Electricatiy, the wuikt pin is a TTL inpur invernally connected to « threestare cutbut «hizh ia in the High- 
un pedance 38. 

So Register Addressed by & inputs. 

Uo is toward MSB, Oows is toward LSB, 


= Plus: == Minus; V "OR: A + 4NO: ¥ = EX-O8 
Source Operand and ALU Function Matrix, 


Source: AMD, 1976, page 8 


commanded by the pipeline register. 


The Program Control Unit (PCU) is the final block to 
be discussed. The PCU is shown in figure 6.8. The PCU 
is implemented using a special integrated circuit (Am2930 ) 
designed for controlling the program addresses. Like so 
many other functional blocks in the 2900 computer, it too 
has a very powerful instruction set ( see table 6.6 ). The 
PCU allows the program counter to be incremented by only 
one control bit. Other addressing modes, such as relative, 
indexed, and base page, can be handled almost as easily. 
In addition, the PCU provides for easy implementation of 
such operations as jump to subroutine, return from 


subroutine, and the associated stack manipulations. 


TABLE 6.6 


PCU INSTRUCTION SET 


instruction 


Number tg 13 12 4 to cé i€ Instruction 
x X XX KX X H Instruction 
Disable 
0 both x L REscT 
1 tue L HX i FETCH PC 
2 Lb tk HL X u FETCH R 
3 LoL tb HH X L FETCH D 
4 LoHLL X L FETCH R+0 
5 Lt HLH X L FETCH PC+0 
6 LolHHL XxX L FETCH PC+R 
7 tL Lb HHH X L FETCH S+0 
8 lHobtuk x t FETCH PC-AR 
3 LHboLH Xx L FETCH RtO-—R 
tL HLHL xX i LOADR 
UH L HH XxX L PUSH PC 
tL HHL LL X u PUSH O 
LHHLH XxX u POPS 
L HM HHL XxX L POP PC 
lL HH HH X L HOLD 
HX X.X-x H L FAIL COND‘L 


TEST (FETCH PC} 


JUMP'R 
JUMP O 
JUMP “Oo 
JUMP R40 
JUMP PCO 
JUMP PCrR 
isa a 

JsB DO 

JSB “0 
JS8 R+0 
JB PL+0 
JSBPCIR 
RETURNS 
RETURN S+0 
HOLD 
SUSPEND 


xErTrrrIrrrTxrxr ret 
Ber rere ae: 


=x 


= 
x 
Errereerrrrircerer 


EEr errr r rrr r rier 
Rerrerrgrerergereicr 


Pe Pale ES Ee PP ee ee Fe 
rm rp errrrerrrrrerreer 


=x 


PC — Program Counter 
Ro — Auxiliary Register 


SP —.Stack Pointer 
D - Oirect inouts 


Nate 1 


“pe 
PC 

R 

Oo 
R+D4C, 
PC+OIC, 
PC+RIC, 
$+04C, 
PC 
R+O+Cy 
PC 


R 
i?) 
“9 


“R*O+Cy 


PC+D4+C, 
PCHRICy 
R 

0 

“9” 
R+04C, 
PCHO4C, 
PCHRtCy 
$s 
S*O+C, 
PC 

2 {Note 2) 


! 
i 


Next Stata (after CP Sf {Nota 3) 


PY = = 22 


“9'C 
PC+C; 
PCH; 
PCIC; 
PCH; 
PC#C; 
PC+HC; 
PC+HC; 
PCH; PC Pc - - 
PCHC; R+O+Cy | RAO+Cy - - 
PCC, iv) ~ - 
PCtC; - PC — Loe SP+1 SP+t 


ogoooo0000 
t 
4 
1 


~ - S§P—i 


PCHG; 


PC+C; 5 = a = 


R+C; 

D+; 

“OG; 
RAO AC; 
PC+OtCy tC; 
FPCPRAC A Aly 
R+C; 

O+C; 

"ONC; 
R+DIC, tC; 
PCIO+C, Cy 
PCHR+CytCj 
S+C; 
$#O0+C,+C; 


D 
fr) l 
PCH; oO es D Loc sP+t | SP+t 
” pete; o - -° sP—1 | 
rs) 
) 


= PC ~~ Loc SP+t SP+1 
, PC — Loc SP+1 

- PC = Loe P41 | SP+1 

~ PC -- Loc SP+1 SP41 

- PC — Loc SP+t SP+4 

- PC ~ Loc SP+1 SP+1 

- - sP-l 

- - sP—1 | 


oe ee ee 


goo0o00000000 00 000 
| 


Notes: 1, Warn JEN is HIGH, ihe Yo-¥3 Outputs contain the same data as when {EN is LOW, as determined by Ig-!4 and cc. 


2. Z = High impedance state (outputs “OFF. 
3. - = No change 


Source: AMD, 1977a 
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7.0 SYSTEM TIMIN 


As menitioned earlier, this computer was designed with 
high performance as a goal. However, for economy the 
microprogramming was done in ultraviolet erasible EPROM. 


This approach saves money because a new set of PROMs is not 


-required each time the microcode is changed, but the 


computer runs rather slow as a result of the 450 ns access 
time of the 2708 EPROM. In order to present a clear 
picture of the system timing, two timing diagrams are 
given. The timing diagrams for the experimental system 

( figure 7.1 ) and for the potentially high performance 
system ( figure 7.2 ) are presented along with a brief 


discussion of each diagram. 


2.1 TIMING FOR THE 


t=] 


XPERIMENTAL SYSTEM 


In order to establish the timing characteristics of 
the computer, the chain of events that take the longest 


time to complete must be identified. This chain of events 


is the critical timing path for the computer. By 


definition no computer operation can take longer to 
complete than the critical path. The critical path 

is the limiting factor for the speed of the computer. 

In other words, the time of one computer cycle can be no 


longer. than the time required for the critical path. 
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CLOCK oe | | Ge ee 


PL REG be SONS - DATA VALID 20”5.SEr vp 


Am 298)I 55 fa NEXT ADDRESS SOURGE VALID 
MAP LS138 138 L 


MAP LS&éG | pel bet (2725 

MAP PROM a Ad5075 B NEXT ADDRESS VALID 

SEQUENCER . 110 

MICROSTORE . - 450 75 INPUT TO 


| PIPELINE 
29011... ae7 1 


2901 LOOK 
AHEAD 10715 -Be) fate 


2901 Cy 5S leat 


STATUS REG pe] 55 bag 
SHIFT OUT pol 2 Lae 
SHIFT JN eee AT EAST BO RS SET UP da se tats 


FIGURE 7.1. Timing diagram for the experimental system 


The computer cycle begins on the leading edge of the 
clock. On the leading edge the microinstruction is 
clocked into the pipeline register. As can be seen from 
figure 7.1 (all times are shown as worst case times), the 
use of the 2708 EPROM for the microstore and again for the 
Mapping PROM in the decoder mode causes large delays. 
Figure 7.1 is set up for a 1.4 microsecond cycle, but the 


time could have been shortened by 260 nanoseconds. 
7.2 HIGH PERFORMANCE SYSTEM 


Since the objective is to design a high performance: 
minicomputer, the experience gained from the experimental 
system needs to be examined to gain a reasonable 
expectation of the potential performance of the 2900 
system. In the experimental version, 2708 EPROM's were 
used, and since these devices were so slow, low power 
Scnottky registers and counters were used to save cost and 
power. However, for the full verformance version 
envisioned, 50 ns PROM or ROM and full speed Schottky 
would be required. This is the premise of the timing 
presented in figure 7.2. Figure 7.2 uses the same time 
scale as figure 7.1 to give a better illustration of the 
performance increases gained by these changes. 

The basic cycle time of the high oerformance computer 


is 270 ns. With the overhead for checking for nalts and 


CLOCK a ine mle oo x 


ns 
3 1? ns 
PL REG DATA VALID 
Am29S8ll | }55 “{- NEXT ADDAESS SOURCE VALID 


MAP LSI38} poles 


MAP PROM ac 

SEQUENCER 30/50 

MICROSTORE tee 5ns SET UP 

2901 I,-¢ pa 

2901 LOOK 

nas is Pod fee 

2901 Cy wool 55 ag 

STATUS REG -p[ 345 beg. 

SHIFT Out 20NS ~tm| baw eer ee 
SHIFT IN : Q- AY LEAST 30713 SET UP 


FIGURE 7.2 Timing diagram for the full performance system 
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interrupts, an ADD without .a shift or skip can be done in 
1.08 microseconds. While this is representative of 
modern minicomputer verformance, it is not as impressive 
as was hoped for when the design started. The fault lies 
in the decoding scheme used for indirect addressing, 
relative addressing, mode 1, and the various other modes 
of operation that are decoded. As mentioned earlier, the 
decoding scheme drastically cuts the amount of Mapping PROM 
and microstore PROM needed, but what are the costs of this 
scheme? When compared to the 1.5 microsecond ADD time of 
Fairchild's 9440 single chip NOVA microprocessor, it is 
not clear that the improvement gain by using a micro- 
programmed bit slice technology is worth the cost in time 
and money. 

| Two approaches can be taken to increase speed. One 
approach would allow for extra time in the cycle only when 
the Mapping PROM is called as the next address source. The 
ther approach would use extra Mapping PROMs to orovide a 
12 bit address that could directly address 4096 words of 
microstore. In the latter approach, each variation of an 
instruction would be addressed directly. This would 
eliminate the need to decode each instruction, but would 
add to the microprogramming task. As a result, the ADD 
time would drop to about 880 ns, and with newer and faster 
RALUS and sequencers, cycle time less than 800 ns could be 


achieved. This figure is more in the range of performance 
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that was first envisioned. 
7.3 COMPARISON OF THE 2900 SYSTEM TO EXISTING SYSTEMS 


In table 7.1 comparisons are made between some of the 
existing variations of NOVA type machines. In this 
comparison the BLAZE, which is Fairchild's 9400 bit slice 
( see section 2.5) emulation of the NOVA 3, and the NOVA 4 
are in the same class as the 2900 emulation of NOVA. The 
SPARK, FLAME, and NOVA 1200 are marginally slower than the 
2900 emulation. The conclusion is that the design of the 
2900 system is faster than present single chip NOVA 
implementations, and is in fact just as fast as other 


bipolar implementations. 


TABLE 701 
NOVA COMPUTER COMPARISONS 


SPEED IN MICROSECONDS 


9440 


=} ; SPARK-16 
INSTRUCTION - BLAZE-16 FLAME-16 NOVA 1200 
ta) (co) : 

Load Accumulator 1.0 2.55 
Store Accumulator 1.0 2.55 
ISZ, OSZ 1.4 3.15 
Jump . 06 1.35 
Jump to Subroutine 1.2 1.35 
Add 1.0 1.35 
Subtract 1.0 1.35 
And . 1.0 1.35 
Move 1.0 1.35 | 

+ Skip 0 1.35 H 
1/0 Input 1.6 2.55 
VO Output 1.6 3.4 


| 
| 


(al Oscillator frequency -- 10 MHz, memory Read cycle -— 400 ng. 
th} Minimum tor semiconductor memory, maximum for 16K coi. | 
(ci Oscillator frequency — 12 Miiz, Memory Busy « 120 ns. 


Source: Suri, 1977, page 10 


8.0 MIECROPROGRAM 


Because of the amount of time required to write 
microcode only a few of the microinstructions were 
microcoded. The microcode, timing diagrams, and comments 
for these instructions are included in the following 
figures. The microcode is also oresented as it appears in 
the microstore memory. 

The following are some of the constraints adopted for 
writing microcode: 

1. The code must be as short as possible. 

2. The code must generate the memory address and the 

memory read signal as far in advance as possible. 
This practice allows slower memories to be used. 
43. The microcode must do as much in parallel as 


possible. This will speed up the throughout. 


¢) 


The application of the first constraint can best be 
illustrated by comparison of the ADD instructions. The 
original ADD instruction ( figure 8.1 ) was quite slow 
because of the repeated use Of jumps to subroutines to 
accomplish the various modes of the ADD instruction. For 
instance, the ADD instruction jumps to a shift subroutine 
in Tl and then. jumps to a skip subroutine in Te. Each of 
these jumps requires at least two steps: the jump and the 
return. As a result, elght micro-cycles are required 


whereas the final ADD instruction (figure 8.2) actually 
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uses only four microcycles. Instead of jumping to and 

from a subroutine, the microcode jumps to the shift 
subroutine then jumps directly to the skip subroutine under 
control of the Mapping PROM. Finally the microcode jumps 
directly to the fetch subroutine under control of the 
pipeline branch address. 

The second constraint is shown clearly in the timing 
diagram for the FPETCH instruction (figure 8.7). Here one 
can see the MR and LD MAR lines go low during Tl. This 
generates the memory read signal and the memory address 
during Tl. However, the memory is not needed until the 


> 


IR LOAD in T3 which allows the memory two microcycles for 


set up. At the 270 ns cycle time, even cheap 500 ns memory 


can be used. If the faster, 200 ns cycle time, CCU 
suggested in section 7.2 is used, memory as slow as 400 ns 
can be used. 

The third constraint is shown in the ADD instruction. 


For instance, in the ADD with no shift and no skip, cycle 


ta 
oO 
) 
[e) 
1 
ct 
a] 
n 
ny) 
Fly 


retch and increment of the program counter, 
a read from RAM ports A and 8 with a store back into port 
B, and a mode 2 jump to the shift subroutine. That is a 


lot of work to do in one microinstruction. 


ADD (original) TIME: 2.16 microseconds min. LOCATION 02h 4 ¢ 


CYCLE MICROCODE COMMENT 


Be-A+B, Fetch PC, LD CARRY Add A plus B and load the sum into B as 
addressed by the IR. Output the present PC 
and increment by 1. Load the carry from the 
add into the carry bit. 


MODE 1, CJS ONET MAP “> Jump to the MODE 2 subroutine given 
jump address in the Mapping PROM to 
proper shift mode. 


1, CJS ONET MAP Jump to the MODE 43 subroutine given by the 
jump address in the Mapping PROM to get the 
proper skip mode. 


MR, JP FETCH(OO2) Load the PC into the MAR and send a memory 
read signal. Jump to Te in the FETCH 
subroutine. 


see FETCH 


FIGURE 8.1 RTL and comments for original ADD microcode 


o£ 


ADD with NO SHIFT and NO SKIP 
CYCLE MICROCODE 
MODE 2-< 1, B-— AtB 


Fetch PC, 


| » CJS ONLT MAP 


MODE 3 = 1, CJS ONET MAP 
JP FRETCH(003), MAR — Y, MR 


HALTC » 1, IR « DATA BUS 


see FETCH 


TIME: 1.08 microseconds: LOCATION 028 


COMMENT 


Add A plus B and load the sum into B as 
addressed by the IR. Output the present PC 
and increment by 1. Jump to the MODE 2 
subroutine given by the jump address in the 
Mapping PROM to get the proper shift mode. 


Jump to the MODE 3 subroutine given by the. 
jump address in the MAPPING PROM to get the 
proper skip mode. 


I Send out a memory read signal and load the 


PC into the MAR. Load the Data Bus into the 
TR and check for a HALT command. 


FIGURE 8.2 RTL and comments for ADD with NO SHIFT and NO SKIP microcode 


ADD with LEFT ROTATE and SKIP TIME: 1.62 microseconds | LOCATION 0284 ¢ 


MODE 2~ 1, B-~- AtB Add A plus B and load the sum into B as 
addressed by the IR. Output the present PC 
and increment by 1. Jump to the MODE 2 
subroutine given by the jump address in the 
Mapping PROM to get the proper shift mode. 


MODE 4< 1, B « SHL B {Jump to the MODE 43 subroutine given by the 
haa jump address in the Mapping PROM to get the 
15°? MR, CJS ONET MAP proper skip mode. Shift RAM B left one and 
'- rotate the MSB to the LSB. Send out a memory 
read signal. 


By 8 


JP FETCH(OO1), Fetch PC, Output the present PC and increment by 1 to 
accomplish a skip. 


see PFPETCH 


FIGURE 8.3 RTL and comments for ADD with LEFT ROTATE and SKIP microcode 


eee 


TIME: 2.16 microseconds min. LOCATION O1C 


CYCLE MICROCODE | COMMENT — 


MODE 1 * 1, CJS ONET MAP Jump to the MODE 1 subroutine given by the 
jump address in the Mapping PROM to get the 
proper addressing mode. 


MR, MAR ~ Y Send out a memory read signal and load the 
effective address into the MAR. 


MR, MBRO « DATA BUS Send out a memory read signal and load the 
DATA BUS into the MBRO. 


ACC « MBRO, JP FPETCH(O000) The MBRO is loaded into the accumulator 


addressed by the IR. Jump to FETCH. 


see FistCH 


FIGURE 8.4 RTL and comments for LOAD ACCUMULATOR microcode 
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Mog. 
M014 
M016 
NACo U9-il 
NAC, U9-12 
NAC, U9-13 
NAC, U9-14 
31 U9-4 
So U9-5 


IR LOAD V45-5 
PR 
LOAD MAR US0-2 
MODE! 

ACo 

nee 

AC2 

ACs 
AC4 


FIGURE 8.5 Timing for LDA 
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FETCH 
CYCLE MICROCODE 
Fetch PC, 
MAR + Y, MR 
HALTC — 1, MR 
{IR =< DATA BUS 


JSRP INT MAP INTS 


TIME: 1.08 microseconds LOCATION erere) 
: | COMMENT 
Output the present PC and increment by |. 


Load PC into MAR and send out a memory 
read signal. 


Checks for a HALT command, sends out a 
fmemory read signal, and loads the IR from 
the DATA BUS. 


If there is an interrupt (INT) then jump to 
the interrupt subroutine (INTS) given by the 
Pipeline Register, else jump to the next 


instruction's microcode which is given by 
the jump address in the Mapping PROM. 


FIGURE 8.6 RTL and comments for the FETCH microcode 
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FIGURE 8.7 Timing for FETCH 
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FIGURE 8.8a Microcode 
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9.0. RRCHNOLOGY TRENDS 


The bit slice computer presented in this paper nas an 
add time of 1.08 microseconds. By the use of more 
extensive hardware, the add time could be reduced to less 
than 800 ns. The question to be answered is as follows: 
Siven the vresent trend in technology, is it worthwhile to 
spend the time and money developing a bit slice machine? 

If the present one chip 16 bit microprocessors are 
surveyed, as done in figure 9.1, the bit slice is 1.1 times 
faster than the fastest monolithic processor and 3 times 
faster than the slowest. At 800 ns,.the bit slice is from 
1.5 to 4 times faster than the monolithic microprocessors. 

If there is to be an improvement in bit slice 
performance it must come in two areas: the sequencer, and 
the memory. The microstore and the Am29811 are PROM. If 
these devices were twice as fast, 128 ns would have been 


cut off the ADD instruction time. If the sequencer were 


=| 


twice as fast an additional 160 ns could be cut off. This 
would result in an ADD time of 512 ns, and this would be 
accomplished without the use of more advanced strategies 
such as multiple pipelining. 

Are these speed increases reasonable? For the most 
part, these times represent the typical times for the same 
devices used in the 800 ns design (the design was on a 


worst case basis). Coupled with the new advance in 


tae, 


encra-purpose regrsters 
jopment system 


frequency (MHz)/phases | 
en 
Number al 


{Gata/instructon) 
Direct addressing 
range (words) 
basic instructions 
Maximum clock 
instruclion’ time 
shortest/longes!? 
On-chip 
interrupts/teve's 
Nomber of interna’ 
Stack segisters 
Prototyping 
system avail. 
Vottazes 

required (V) 
Assembiy Wnguage 
devel 

Time Saanng 

cross softwere 


(us) 
Tit 
compatible 


Word size 
Number of 


Comment 


* Quta Generai e/16} 3 . 4.2/23.5 | 10, 29 es] Emulates NOVA iasteyction set 
Faueniid a 6/18 j rt Emutates NOVA instruction sei 
Ferrari : 16/16 : 1.19/5.25 s ; Yes | Can do double word operations 


Genecal insteument 16/16 1.6/4.8 j : : Ail internal segisigs con be accumulators 
Katiozal Semiconductor 16/46 25/5 , 4 Architecture intended for data handling 


Pangiacens M1610. TNMOS 6/16 2/6 ‘ 

Taxas Instruments TeaS9S80 [NMOS Ye/GL i ‘ 5.2/49.6 : ; 3.12, y 3 | Small version at TMS 9900 

Teas: tnstruments TMS/S8P9900 [NntOs 16/16 : 2/34 ‘ 5 $ : Emubiies 990 min inétructions 
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Westera Digital WD-15 [MOS 16/16 3/4 421/780 ; . Very sunilae lo DEC LSi-1 


J, tas B-bit external Dusas and 1é-bit internal buses 2. With maximum clock 3. Except ciock lines 4. Standard TTL or MOS circuits wil suilice 


FIGURE 9,1 A survey of 16 bit general purpose microprocessors 


From: Electronic Design, 1977, page 56 
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ee 


tristate buffers for interfacing ECL and TTL, doubling. the 
speed within one year should be no problem. 

There is a limit to now fast a computer can operate, 
and the limit is established by. the physical dimensions of 
the computer and the speed of light. Present bit slice 
technology requires 50 to 100 integrated circuits for the 
CCU. It will be very hard to package such a CCU in less 
than a square foot of area. In this dense configuration, 
the maximum distance would be about two feet. Since light 
travels approximately one foot every nanosecond, the time 
delay for a signal to be sent and its reply received would 
be & ns. This is equivalent to an extra gate level in the 
circuitry. On the other hand, since distances are 
measured in mils, the speed of light is not a practical 
consideration in the speed of single chip microprocessors. 

It is apparent that the optical limits have or shortly 
will be reached in integrated circuit processing. This can 
be seen in figure 9.2. As figure 9.3 shows, there are two 
techniques being developed to take over where optical 
techniques leave off. The electron beam method promises a 
100 fold density increase over the ocresent density, and the 
X-ray approach offers a 1000 fold increase in density. At 
present the electron beam is the nearest to operational. 
From figure 9.3 the full impact of this technology can be 
seen. Texas Instruments projects a 32 bit microcomputer 


(not microprocessor) oy 1983. This microcomputer would 
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FIGURE 9.2 Future of electron beam technology 
From: Altman, 1977. 
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FIGURE 9.5 VLSI techniques | 
From: Altman, i977a 


have 32 K words of memory on the chip in addition to the 
CPU. 

The 19843 microprocessor would require a 20 fold 
increase in density. This increase in density would 
decrease the capacitance of the integrated circuit. Since 
capacitance is the major speed killer in MOS circuitry, 
the decrease in capacitance by about 20 times would 
correspond to a 20 fold increase in speed. To be 
conservative and to take into account the extra carry time 
for a 32 bit machine, assume that the speed increase is 
only by a factor of 10. Since the fastest 16 bit MOS 
microprocessor available today has an instruction time 
range of 1.2 to 29.5 microseconds, the 1983 32 bit machine 
would do an ADD in 120 ns and a divide in 2.95 
microseconds. 


The design of complex circuits with electron beam 


technology will require even heavier dependence on computer 


aided design than present digital LSI designs. ‘ith the 
required advances in computer techniques and computer use 


for the 1980's, it is not hard to imagine a highly 


computerized and. integrated design and manufacturing system 


for VLSI technology. When compared with the cost of 
designing a bit slice machine it may be cheaper to have an 
integrated circuit house design a custom microcomputer. 


That is unless a standard system can be used. After all, 


most applications that exist today can be accomplished with 
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a computer capable of a 120 ns ADD. 

While bit slice nas a definite advantage now, the 
advantage will fade over the next decade unless the 
circuitry can be integrated into larger and faster slices. 
The problem is to get a faster technology into a smaller 
area. It is not an easy problem because most technologies 
require more power to go faster. As the size of the chip 
is reduced the power per unit area goes upe In the end, 
the monolithic microprocessor will probably win out, but 
until then the bit slice does offer some definite 


advantages. 


Bu. 


10.0 SUMMARY 


There are two broad areas of interest in this computer 
design. One area is cost, and the other area is 
performance. Cost includes the actual hardware cost, 
construction time, hardware design time, and firmware cost. 
In the area of performance we are concerned with the 
aspects of speed and flexibility. 

The design of this system took several months of 
gathering and reading the material on bit slice technology 
and on the instruction sets for the PDP-11, NOVA, and PACE. 
Then came several design fterations: as I tried to 
asSimilate all the material. Finally, it took about six 
weeks to design the computer hardware. 

In the construction phase, not counting the time it 
took to strip the wire wrap board, it took about two and a. 
half weeks to wire wrap and document the board. An 
additional week was required to find the wiring mistakes 
and other problems with the wire wrap board. 

As for the limited instruction set reduced to firmware, 
two weeks were required to write the original RTL programs, 
and two additional weeks were needed to write the binary 
code. Of these instructions only the FETCH, LDA, ADD 
(original), ADD without shifts or skips, and ADD with left 
rotate and skip always were checked out. The total time for 


Ly 


the checkout was probably no more than two weeks, but the 


problem of programing the PROM's at work and checking the 
firmware at home added a great deal of time to the 
procedure. 


In the performance area, the results were not as good 


could significantly reduce the execution time. It has 
become apparent that this is not a one man job. Rather 
the task should be attacked by a well coordinated design 
group. The complete design and construction of this 
computer could well take two man years. To avoid the 
ovroblems of a project of this size, such as the 
demoralization that comes from chipping away at a large 
problem with no apparent progress, at least six people 
should be usede Two people should be used in the hardware 
area. One person should design the CCU and the other 
should design the memory and peripheral interface. Two 
seeeus should be used to design the firmware and one person 
should design the monitors and assemblers so something can 
be done with the machine once the design phase is finished. 
Finally, one person is needed to bring the total design 
together into one cohesive effort. This person should be 
able to understand both the hardware and the software 
aspects of the computer so he can coordinate the two 
efforts towards the same goals and provide help in each 


area when it is needed. 
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